Domain Adaptation for Statistical Machine Translation of Corporate and User-Generated Content

نویسندگان

  • Pratyush Banerjee
  • Josef van Genabith
چکیده

xi Acknowledgements xii

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

This paper reports experiments on adapting components of a Statistical Machine Translation (SMT) system for the task of translating online user-generated forum data from Symantec. Such data is monolingual, and differs from available bitext MT training resources in a number of important respects. For this reason, adaptation techniques are important to achieve optimal results. We investigate the ...

متن کامل

The ACCEPT Portal: An Online Framework for the Pre-editing and Post-editing of User-Generated Content

With the development of Web 2.0, a lot of content is nowadays generated online by users. Due to its characteristics (e.g., use of jargon and abbreviations, typos, grammatical and style errors), the user-generated content poses specific challenges to machine translation. This paper presents an online platform devoted to the pre-editing of user-generated content and its post-editing, two main typ...

متن کامل

Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities

This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce, by highlighting extant obstacles and identifying relevant technologies to overcome them. In this sense, it firstly proposes a typology of e-commerce static and dynamic textual genres and it identifies those that may be more successfully targeted by SMT. The specifi...

متن کامل

Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word Reduction: Normalization and/or Supplementary Data?

This paper reports a set of domain adaptation techniques for improving Statistical Machine Translation (SMT) for usergenerated web forum content. We investigate both normalization and supplementary training data acquisition techniques, all guided by the aim of reducing the number of Out-Of-Vocabulary (OOV) items in the target language with respect to the training data. We classify OOVs into a s...

متن کامل

Translation-based ranking in cross-language information retrieval

Today’s amount of user-generated, multilingual textual data generates the necessity for information processing systems, where cross-linguality, i.e the ability to work on more than one language, is fully integrated into the underlying models. In the particular context of Information Retrieval (IR), this amounts to rank and retrieve relevant documents from a large repository in language A, given...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013